Home | Final Project |

Introduction

My exploratory data analysis, visualizations, interactive charts, animations, and other intriguing insights on the Airbnb data are all available here. I am focusing on Washington D.C. data. The reasons being because I am orginially from the Metro area, I am interested in real estate there, and because the nation’s captial is among the top visited cities in the nation.

The following are some questions that I aim to answer through my analysis:

  • Where are most of the airbnb’s located in the Metro area?
  • How is Airbnb priced across the year?
  • How does the price vary across the week?
  • What is the occupancy rate by month?
  • Is there anything that stands out in the free-text part of the reviews as a common theme? What components of the renting experience do consumers enjoy, and which do they despise?

Description of the Data

The data is sourced from the Inside Airbnb website http://insideairbnb.com/get-the-data.html which hosts publicly available data from the Airbnb site.

The dataseet comprises of three main tables:

A quick glance at the data shows that there are: * 1836 unique listings are provided for the Washington DC area * Over 30,000 reviews have been left from November 2010-December 2021 * The price for a listing ranges from $25 per night to over $10,000 per night.

Exploratory Data Analysis

In this section, we’ll go through our findings from the exploratory data analysis and visualization that we did in the introduction, as well as obtain some preliminary insights. We’ve broken it into four pieces, each of which aims to address the issues using a different type of visualization.

Spatial Analysis

This section will use spatial visualizations to analyze various factors from our dataset and will answer questions about pricing and rating variances across different areas in DC.

This is the most basic interactive graph, with all of the listings in Washington, D.C. clustered together. Clusters can be clicked to reveal the listing they contain. This results in a zoomed-in view. You may also click on each listing to get further information such as the Listing Name, Host Name, Property Price, Property Type, and Room Type. This visualization aids in the geographical exploration of each listing. It offers a general idea of how listings are scattered around the neighborhood. Maximum listings are focused in Dupont Circle, Trinidad, Bloomingdale, Eckington, Anacostia, and those somewhat near to The Capitol, as seen on the map.


Demand and Price Analysis

Looking at the demand for Airbnb listings in Washington DC in this part. As well as do a more detailed study to see how prices change depending on the day of the week. We will utilize the ‘number of reviews’ variable as an indicator for demand because we do not have data on bookings made in the previous year. According to Airbnb, around half of guests leave reviews for the hosts/listings, thus looking at the quantity of reviews will give us a decent idea of demand.

How is Airbnb priced across the year?

We wanted to see if the pricing of the postings followed a similar trend after seeing the pattern in demand. To address the aforementioned issue, we used the data from the ‘calendar’ table to look at the daily average prices of the listings through time.

As we can see, the average price is more expensive between May and October.

On the above graphs, we can also notice two sets of points indicating that average prices on certain days were greater than on other days. To further comprehend this phenomena, we’ll create a box plot showing average costs by weekday.

We can see that Fridays and Saturdays have a higher concentrated price for the renting on the weekends.

Demand and Supply : Airbnb Customer Growth vs Listing Prices over time

The price is does fluctuate, seems to be high in the summer motnsh and goes down in the winter months.

Occupancy Rate by Month

I’ll end this section’s examination by looking at the occupancy forecast for the coming year. We will use the table ‘calendar’ to determine the % occupancy for the next year, i.e., what proportion of apartments have already been booked as of November 3, 2020 (the day the data was obtained).

From the calendar heat map for 2021 and the currently booked dates 2022, all months seem to have a similar booking amount.

User Review (Textual Data) Mining

The dataset gives us a lot of information, but none of it is as insightful or as near to the client as their evaluations and feedback. If correctly mined, they may reveal a lot about the customer’s attitude, expectations, and how effectively those expectations were satisfied. The review text data must be cleaned extensively in order for the final result to make sense - for example, words must be stemmed, commas, fullstops, percentages, and other punctuation must be eliminated, frequent English terms and stop words must be deleted, and so on.

Comment Analysis Using Word Cloud

Let’s start by looking at the most common topics in the reviews; just creating a word cloud should enough. Wordclouds take a frequency count of the words in the corpus as input and produce a visually appealing representation of dominating (often occurring) words, with their size proportionate to their frequency. We have over a million reviews, thus we need to take a random sample, in this case 30,000 reviews. Despite the fact that the sampled dataset is minimal in contrast to the original, it meets our purpose well because we just need the basic terms here. As we’ll see in the next section, further study of “good” and “negative” reviews will require more data.


The downloaded binary packages are in
    /var/folders/56/8zhv0g715nn5k97yqp1qpw800000gp/T//RtmpKbOOhk/downloaded_packages
INFO  [10:22:47.458] epoch 1, loss 0.1989 
INFO  [10:23:05.994] epoch 2, loss 0.1260 
INFO  [10:23:24.762] epoch 3, loss 0.1029 
INFO  [10:23:43.452] epoch 4, loss 0.0917 
INFO  [10:24:02.249] epoch 5, loss 0.0844 
INFO  [10:24:20.381] epoch 6, loss 0.0791 
INFO  [10:24:37.472] epoch 7, loss 0.0750 
INFO  [10:24:56.189] epoch 8, loss 0.0718 
INFO  [10:25:14.944] epoch 9, loss 0.0691 
INFO  [10:25:33.192] epoch 10, loss 0.0668 
INFO  [10:25:51.839] epoch 11, loss 0.0649 
INFO  [10:26:10.631] epoch 12, loss 0.0633 
INFO  [10:26:29.291] epoch 13, loss 0.0618 
INFO  [10:26:48.294] epoch 14, loss 0.0606 
INFO  [10:27:05.115] epoch 15, loss 0.0595 
INFO  [10:27:22.246] epoch 16, loss 0.0585 
INFO  [10:27:39.254] epoch 17, loss 0.0577 
INFO  [10:27:57.011] epoch 18, loss 0.0569 
INFO  [10:28:15.505] epoch 19, loss 0.0562 
INFO  [10:28:33.960] epoch 20, loss 0.0556 

Building Word Vectors from Reviews

The previously constructed word cloud is effective at locating what clients are looking for, but it is quite broad. Isn’t it wonderful if we could find out what people think about the room sizes? * What makes consumers “uncomfortable”? * What makes consumers “comfortable”?


The downloaded binary packages are in
    /var/folders/56/8zhv0g715nn5k97yqp1qpw800000gp/T//RtmpKbOOhk/downloaded_packages

Conclusion

While both client numbers and listing prices are on the rise, there are some very fascinating seasonal trends to be found. The amount of customer reviews submitted at a specific timestamp is a strong predictor of demand at that particular point in time. As the number of Airbnb users grows, one may anticipate the number of reviews to grow as well from January to December each year, but we’ve seen an unusual tendency. Every year, the quantity of reviews peaks around October and then drops dramatically as the year progresses. The holidays appear to be a plausible explanation for the drop in sales.

To operate a successful business, you must first understand your clients. If correctly mined, consumer reviews may give a lot of information. Cleanliness, the neighborhood, and whether or not particular sites were “walkable” were some of the most prominent positive themes that made respondents “comfortable.” “Hosts” and “communication” are key themes, and filthy bedsheets and linens make a lasting impression. Room size (reported as “tiny,” “stuffy,” “claustrophobic”), temperature/heating concerns (“cold,” “hot,” “damp”), and safety issues were also not pleasant for the guests (“nervous”,“unsafe”,“stressful”),

Some code has been copied from “author: Ankit Peshin, Sarang Gupta, Ankita Agrawal”